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Abstract 

The universahty phenomenon asserts that the distribution of the eigenvalues of random 
matrix with iid zero mean, unit variance entries does not depend on the underlying structure of 
the random entries. For example, a plot of the eigenvalues of a random sign matrix, where each 
entry is +1 or —1 with equal probability, looks the same as an analogous plot of the eigenvalues 
of a random matrix where each entry is complex Gaussian with zero mean and unit variance. 
In the current paper, we prove a universality result for sparse random n by n matrices where 
each entry is non-zero with probability l/n^~" where < a < 1 is any constant. The sparse 
universality result proves convergence in probability and has one additional hypothesis that the 
real and imaginary parts of the entries are independent (this hypothesis is most likely an artifact 
of the proof). One consequence of the sparse universality principle is that the circular law holds 
for sparse real random matrices so long as the entries have zero mean and unit variance, which 
is the most general result for sparse real matrices to date. 



1 Introduction 

Given an n by n complex matrix A, we define the empirical spectral distribution (which we will 
abbreviate ESD), to be the following discrete probability measure on C: 

Ij^(z) ■=-\{l<i<n: Re(Ai) < Re(z) and Im(Ai) < Im(z)}| , 
n 

where Ai, A2, ■ ■ ■ , An are the eigenvalues of A with multiplicity. In this paper, we focus on the case 
where A is chosen from a probability distribution on IVl[n(C)5 the set of all n by n complex matrices, 
and thus /i^ is a randomly generated discrete probability measure on C. 



1.1 Background: universality and the circular law 

Suppose that An is an n by n matrix with iid random entries, each having zero mean and unit 
variance. The distribution of the eigenvalues of (l/i/n)A„ approaches the uniform distribution on 
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the unit disk as n goes to infinity, a phenomenon known as the circular law. The non-sparse circular 
law has been proven in many special cases by many authors, including Mehta [15] (Gaussian case), 
Girko O [8] , Edelman [6] (real Gaussian case) , Bai [1] and Bai-Silverstein [2] (continuous case with 
bounded (2-|-5)th moment, for 6 > 0), Gotze-Tikhomirov [9] (sub-Gaussian case) and [10] (bounded 
(2 -|- 6)th moment, for 5 > 0), Pan-Zhao pTT] (bounded 4th moment), and Tao-Vu [25] (bounded 
(2 -|- (5)th moment, for 6 > 0). The following, due to Tao and Vu [271 Theorem 1.10], is the current 
best result, requiring only zero mean and unit variance. 

Theorem 1.1 (Non-sparse circular law). \27\ Theorem 1.10] Let he the n by n random matrix 
whose entries are iid complex random variables with mean zero and variance one. Then the ESD 
of -^Xn converges (both in probability and in the almost sure sense) to the uniform distribution 
on the unit disk. 

Proving convergence in the almost sure sense is in general harder than proving convergence in 
probability, and in the current paper, we will focus exclusively on convergence in probability. See 
Subsection 11.41 towards the end of the introduction for a description of convergence in probability 
and in the almost sure sense for the current context. 

In [27], Tao and Vu ask the following natural question: what analog of Theorem 1 1.1 1 is possible 
in the case where the matrix is sparse, where entries become more likely to be zero as n increases, 
instead of entries having the same distribution for all n? One goal of the current paper is to provide 
an answer to this question in the form of Theorem 1 1 . 6 1 (see below), which proves the circular law for 
sparse random matrices with iid entries with the additional assumption that the real and complex 
parts of each entry are independent. In Figure [H parts (b) and (d) give examples of the non-sparse 
circular law for Bernoulli and Gaussian random variables, and parts (a) and (c) give examples of 
the sparse circular law for Bernoulli and Gaussian random variables. 

The literature studying the eigenvalues of sparse random matrices is distinctly smaller than 
that for non-sparse random matrices. Most authors have focused on studying the eigenvalues in 
the symmetric case, including [20l [181 [SI [2T1 [TBI [l9l [TU [22] . There has been, however, some recent 
and notable progress for non-symmetric sparse random matrices. Gotze and Tikhomirov [9l [10] 
provide sparse versions for their proofs of the circular law with some extra conditions. In [9] they 
use the additional assumptions that the entries are sub-Gaussian and that each entry is zero with 
probability pn where Pnn'^ — t- oo as n — t- oo, and in |10j they use the additional assumption that the 
entries have bounded (2 -|- J)th moment. The strongest result in the literature for non-symmetric 
sparse random matrices is due to Tao and Vu [25] who in 2008 proved a sparse version of the 
circular law with the assumption of bounded (2 -|- 6)th. moment (note that [25] proves almost sure 
convergence, rather than convergence in probability as shown by |9l HOj). 

Theorem 1.2. [25^ Theorem 1.3] Let a > and 5 > be arbitrary positive constants. Assume that 
X is a complex random variable with zero mean and finite {2 + 5)th moment. Set p = n~^^° and let 
An be the matrix with each entry an iid copy of -^IpX, where Ip is a random variable independent 
of x taking the value 1 with probability p and the value with probability 1 — p. Let n i ,y he the 

ESD of -^^An, where is, as usual, the variance of x. Then p^_j^^ converges in the almost 
sure sense to the uniform distribution fi^o over the unit disk as n tends to infinity. 

In this paper, we prove a sparse circular law without the bounded (2 -|- 5)th moment condition, 
with our work being motivated by the proof in [27] of the (non-sparse) circular law in the general 
zero mean, unit variance case. 
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(a.) Sparse Bernoulli (b) Non-sparse Bernoulli 




(c) Sparse Gaussian (d) Non-sparse Gaussian 




Figure 1: The four figures above illustrate that the circular law holds for for Bernoulli and Gaussian 
random matrix ensembles in both the sparse and non-sparse cases. Each plot is of the eigenvalues 
of a 2000 by 2000 random matrix with iid entries. In the first column (figures (a) and (c)), the 
matrices are sparse with parameter a = 0.4, which means each entry is zero with probability \ — ^^^ 
and in the second column (figures (b) and (d)), the matrices are not sparse (i.e., a = 1). In the 
first row, both matrix ensembles are Bernoulli, so each non-zero entry is equally likely to be — 1 
or 1, and in the second row, the ensembles are Gaussian, so the non-zero entries are drawn from a 
Gaussian distribution with mean zero and variance one. 
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There has been much recent interest in demonstrating universal behavior for the eigenvalues of 
various types of random matrices. The following theorem is a fundamental result from [27]. For a 
matrix A = {aij)i<ij<n, we will use ||^||2 to denote the Hilbert-Schmidt norm, which is defined by 

PII2 = trace^A* = {J2i<i,j<n Wijl ) ■ 

Theorem 1.3 (Universality principle). [27] Let x and y be complex random variables with zero 
mean and unit variance. Let := (xjj)i<jj<„ and ■= {yij)i<i,j<n be n x n random matrices 
whose entries Xij,yij are iid copies of x and y, respectively. For each n, let Mn be a deterministic 
n X n matrix satisfying 
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sup — ;r llMnL < 00. (1) 
n 

Let An := M„ + X„ and Bn ■= M„ + Yn- Then u 1 a — At 1 r converges in probability to zero. 

The universality principle as proven in [271 Theorem 1.5] also includes an additional hypothesis 
under which — converges almost surely to zero (see [27] for details). In [27], Tao and 

Vu suggest the project of extending their universality principle for random matrices to the case of 
sparse random matrices. In this paper, we will follow the program developed in [27] and prove a 
universality principle for sparse random matrices. 

1.2 New results for sparse random matrices 

We begin by defining the type of sparse matrix ensemble that we will consider in this paper. 

Definition 1.4 (Sparse matrix ensemble). Let < a < 1 be a constant, and let Ip be the random 
variable taking the value 1 with probability p := n~^"^" and the value with probability 1 — p- Let 
a; be a complex random variable that is independent of Ip. The n by n sparse matrix ensemble for 
X with parameter a is defined to be the matrix Xn where each entry is an iid copy of -^IpX. 

The main result of the current paper is the following: 

Theorem 1.5 (Sparse universality principle). Let < a < 1 be a constant, and let x be a random 
variable with mean zero and variance one. Assume that the real and complex parts of x are inde- 
pendent; namely, that Re(x) is independent o/Im(a;). Let Xn be the n by n sparse matrix ensemble 
for X with parameter a, and let Yn be the n by n matrix having iid copies of x for each entry (in 
particular, Yn is not sparse). For each n, let Mn be a deterministic n by n matrix such that 

1 2 

sup WMnWo < 00, (2) 
n 

and let An := Mn + Xn and Bn '■= Mn + Yn- Then, Pj_a ~ I^^b converges in probability to 
zero. 

Figure [H gives an illustration of Theorem 11.51 with non-trivial M„, for sparse and non-sparse 
Bernoulli and Gaussian ensembles. In [12] a description is given for the asymptotic distribution of 
the ESD of a random matrix of the form M„ -|- Xn, where Mn is an arbitrary diagonal matrix. 

Relating the sparse case to the non-sparse case in the above theorem is quite useful, since many 
results are known for random matrices with non-sparse iid entries, including a number of results 
in [27]. One of the motivating consequences of Theorem 11.51 is the following result, which is a 
combination of Theorem 11.51 and Theorem 11.11 the non-sparse circular law proven in [27j . 
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Theorem 1.6 (Sparse circular law). Let < a < 1 be a constant and let x be a random complex 
variable with mean zero and variance one, such that Re(a;) and lm{x) are independent. Let Xn 
be the sparse matrix ensemble for x with parameter a. Then the ESD for -^Xn converges in 
probability to the uniform distribution on the unit disk. 

An illustration of Theorem 11.61 appears in Figure [TJ Note that the sparse circular law (The- 
orem II. 6p does not hold when a = 0, since the probability of a row of all zeroes approaches a 
constant as n — )• oo, and thus with probability tending to 1 as n — t- oo, a constant fraction of the 
rows contain all zeroes. 

In the non-sparse case, [27] also gives a number of extensions and generalizations, one of which 
is the circular law for shifted matrices, including the case where the entries of a random matrix 
have constant, non-zero mean. 

Theorem 1.7 (Non-sparse circular law for shifted matrices). |27l Corollary 1.12] Let X„ be the n 
by n random matrix whose entries are iid complex random variables with mean and variance 1, 
and let Mn be a deterministic matrix with rank o{n) and obeying Inequality ([TJ. Let An := M„+X„. 
Then the ESD of n converges (both in probability and in the almost sure sense) to the uniform 
distribution on the unit disk. 

Because Theorem 11.71 applies to non-sparse matrices of the form M„ + Xn, it can be directly 
combined with the sparse universality principle of Theorem 11.51 to yield the following result: 

Theorem 1.8 (Sparse circular law for shifted matrices). Let < a < 1 be a constant, and let x 
be a complex random variable with mean and variance 1 such that the real and complex parts of 
X are independent (i.e., Re(a;) and Im(x) are independent). Let Xn be the n by n sparse random 
matrix ensemble with parameter a, let Mn be a deterministic matrix with rank o(n) and obeying 
Inequality ([1]), and let An := M„ + X„ 
uniform distribution on the unit disk. 



Then the ESD of -^An converges in probability to the 



An example of Theorems 11.71 and 11.81 appears in Figure [3l 
The simple lemma below is a critical component for adapting arguments from [2 
case and illustrates a critical transition that occurs when a = 0. 



to the sparse 



Lemma 1.9. Let ^ be a complex random variable such that E|^| < oo. Let X be a sparse version 
of ^, namely X := IpS^/p, where p = where < a < 1 is a constant. Then 



E( 



0, 



as n —)• OO . 



Proof. The key steps to this proof are using independence of Ip and ^, and applying monotone 
convergence. We compute: 



E( 



l{|X|>ni-«/2}^ 



< -E( 
P 



E( 



{|5|>W2} 



Finally, E( 



l{C>n-/2}^ 



as n — 7- oo by monotone convergence. 



□ 
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(a.) Sparse Bernoulli 



(b) Non-sparse Bernoulli 
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(c) Sparse Gaussian 



(d) Non-sparse Gaussian 
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Figure 2: The four plots above illustrate that the universality principle holds for Bernoulli and 
Gaussian random matrix ensembles in both the sparse and non-sparse cases. Each plot is of the 
eigenvalues of a 10, 000 by 10, 000 random matrix with of the form M„ -|- where M„ is a fixed, 
non-random matrix and X„ contains iid entries. For each of the four plots, -^M„ is the diagonal 

matrix with the first [n/4j diagonal entries equal to —1 — -*/— T, the next [n/6j diagonal entries 
equal to 1.2 — 0.8^—1) the next n/12 diagonal entries equal to 1.5 -|- 0.3\/— 1, and the remaining 
entries equal to zero. In the first column (figures (a) and (c)), the matrices Xn are sparse with 
parameter a = 0.5, which means each entry is zero with probability 1 — and in the second 
column (figures (b) and (d)), the matrices Xn are not sparse (i.e., a = 1). In the first row, both 
matrix ensembles are Bernoulli, so each non-zero entry of X„ is equally likely to be —1 or 1, and 
in the second row, the ensembles are Gaussian, so the non-zero entries of Xn are drawn from a 
Gaussian distribution with mean zero and variance one. 



6 



(a) 



n 



100; Sparse Bernoulli 



(b) 



n 



100; Non-sparse Bernoulli 
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(c) n = 1000; Sparse Bernoulli 



(d) n = 1000; Non-sparse Gaussian 




-0.5 0.5 



1.5 2 2.5 



-0.5 0.5 



1.5 2 2.5 



(e) n = 10, 000; Sparse Bernoulli 



(f ) n = 10, 000; Non-sparse Bernoulli 



1.5 2 2.5 



Figure 3: The six figures above illustrate that the circular law holds for for shifted sparse Bernoulli 
and shifted non-sparse Bernoulli random matrix ensembles. Each plot is of the eigenvalues of an n 
by n (with n as specified) random matrix of the form M„ + X„, where M„ is a non-random diagonal 
matrix with the first [-v/nj diagonal entries equal to and the remaining entries equal to zero, 
and Xn contains iid random entries. In the first column (figures (a), (c), and (e)), the matrices are 
sparse with parameter a = 0.4, which means each entry is zero with probability 1 — and in 
the second column (figures (b), (d), and (f)), the matrices are not sparse (i.e., q = 1). The matrix 
ensembles are Bernoulli, so each non-zero entry is equally likely to be —1 or 1. As n increases, the 
ESDs in both the sparse and non-sparse cases approach the uniform distribution on the unit disk. 
Empirically, the small circle on the right, which has roughly ^/n eigenvalues in and near it, shrinks 
until its contribution to the ESD is negligible (as drawn, the small circle has radius n~^/^). 
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Remark 1.10. The proof of Lemma 11.91 illustrates that p = 1/n is a transition point for sparse 
random variables of the type Ip^ where the arguments for universality break down. Notably, the 
proof of Lemma 11.91 also works for a depending on n so long as alogn tends to infinity as n — )• oo; 
for example a = „ is suitable. It would be interesting to see if the universality principle 

extends to parameters a that tend slowly to zero as n — )■ oo. 



1.3 Further directions 

There are a number of natural further directions to consider with respect to the sparse universality 
principle Theorem 11.51 One natural question is whether the condition that the real and complex 
parts are independent can be removed. The condition is not necessary in the non-sparse case 
and seems to be an artifact of the proof; more discussion is provided in Remark 12.41 Another 
natural question is whether Theorem 11.51 can be generalized to prove almost sure convergence in 
addition to proving convergence in probability. A result of Dozier and Silverstein [4] is one of the 
ingredients used in |27j to prove almost sure convergence; however, there does not seem to be a 
sparse analog of [4]. Proving a sparse analog of [4] would be a substantial step towards proving a 
universality principle with almost sure convergence (see Remark 12. 5p . though there may be other 
avenues as well. Finally, a general question of interest would be to study the rates of convergence 
for the universality principle. Convergence seems reasonably fast in the non-sparse case; however, 
empirical evidence indicates that convergence is slower in the sparse case and may in fact depend 
on the underlying type of random variables — see Figure H] for an example. A bound on convergence 
rates in the non-sparse case where the (2 + 6)th. moment is bounded is given in \25\ Section 14]. 



1.4 Definitions of convergence and notation 



Let X be a random variable taking values in a Hausdorff topological space. We say that 
converges in probability to X if for every neighborhood Nx of X, we have 

lim Pr(X„ eNx) = l. 

n— >oo 

Furthermore, we say that X„, converges almost surely to X if 



Prf hm Xr, 



X) = l. 



If Cn is a sequence of random variables taking values in M, we say that C„ is bounded in 
probability if 

lim liminf Pr(C„ < K) = 1. 

K^oo n— >-oo 

In the current paper, we are interested in how a randomly generated sequence of ESDs fiA„ 
converges as n — )• oo, and so we will put the standard vague topology on the space of probability 
measures on C. In particular, if /u„ and /u'„ are randomly generated sequences of measures on C, 
then fin converges to fi'^ in probability if for every smooth function with compact support / and 
for every e > 0, we have 



lim Pr( 



fdpr. 



<e) = l. 



Furthermore, pn converges to almost surely if for every smooth function with compact support 
/ and for every e > 0, the expression f dpn ~ Jc f dlJ-n\ converges to with probability 1. 
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(a) Sparse Bernoulli 



(b) Non-sparse Bernoulli 
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(c) Sparse Gaussian 



-0.6 - 
-0.8 



(d) Non-sparse Gaussian 



-0.5 



Figure 4: The four figures above indicate that the rates of convergence to the uniform distribution on 
the unit disk for for sparse Bernoulli and sparse Gaussian random matrix ensembles are apparently 
not the same as each other, and that in particular the sparse Gaussian case converges more slowly 
that the non-sparse case. Each plot is of the eigenvalues of a 2000 by 2000 random matrix with iid 
entries. In the first column (figures (a) and (c)), the matrices are sparse with parameter a = 0.2, 
which means each entry is zero with probability 1 — , and in the second column (figures (b) and 
(d)), the matrices are not sparse (i.e., a = 1). In the first row, both matrix ensembles are Bernoulli, 
so each non-zero entry is equally likely to be —1 or 1, and in the second row, the ensembles are 
Gaussian, so the non-zero entries are drawn from a Gaussian distribution with mean zero and 
variance one. 
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For functions / and g depending on n, we will make use of the asymptotic notation f = 0{g) to 
mean that there exists a positive constant c (independent of n) such that f < eg for all sufficiently 
large n. Also, we will use the asymptotic notation / = o{g) to mean that //(^ — ?■ as n — )• oo. 



1.5 Paper outline 

Recall that the sparseness is determined hy p := n^^"*"". In the remaining sections, we will follow 
the approach used in [27] to prove a universality principle for sparse random matrices when a > 0. 
In Section [21 we outline the main steps of the proof, highlighting a general result about convergence 
of ESDs from [27] that essentially reduces the question of convergences of ESDs to a question of 
convergence of the determinants of the corresponding matrices (one of which is sparse, and the 
other of which is not). Section [3] gives a proof of a sparse version of the necessary result on 
convergence of determinants based on a least singular value bound for sparse matrices in |25] and 
two lemmas, which are proved in Sections [J] and [5j respectively. In Section \5\ we make use of a 
result of Chatterjee [3] which requires adapting Krishnapur's ideas in [271 Appendix C] to a sparse 
context ( \27\ Appendix C] is dedicated to proving a universality principle for non-sparse random 
matrices where the entries are not necessarily iid). 



2 Proof of Theorem 11.51 

The following result was proven by Tao and Vu [27[ Theorem 2.1] and can be applied directly in 
proving Theorem 11.51 All logarithms in this paper are natural unless otherwise noted. 

Theorem 2.1. [27| Suppose for each n that An,Bn £ M„(C) are ensembles of random matrices. 
Assume that 



(i) The expression 

is bounded in probability. 

(ii) For almost all complex numbers z, 



J_||^ ||2 , J_||!3||2 



(3) 



-log 

n 



det(^A„ - zl) 



n 



-log 

n 



det(^5„ - zl) 



n 



converges in probability to zero. In particular, for each fixed z, these determinants are non- 
zero with probability 1 — o(l) for all n. 



Then, n i . 

7^7 " 



Pj_q^ converges in probability to zero. 



Note that a stronger version of the above theorem appears in [27[ Theorem 2.1] which addition- 
ally gives conditions under which pj_j<. — p-j_n converges almost surely to zero. 

The lemma below is a sparse version of [271 Lemma 1.7]. 

Lemma 2.2. Let Mn, An, and Bn be as in Theorem \1.5[ Then ^ ||^n|l2 ^'^^ fc kl^ dfi i ^\^Jz) 
are bounded in probability, and the same statement holds with Bn replacing An. 
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Proof. Our proof is the same as the proof [271 Lemma 1.7], except that we need to use a sparse 
version of the law of large numbers (see Lemma fA.ip . By the Weyl comparison inequality for second 
moment (see [271 Lemma A. 2]), it suffices to prove that ^ ||^n|l2 is bounded in probability, and by 
the triangle inequality along with Inequality ([2]), it thus suffices to show that ^ ||-^n|l2 is bounded 
in probability. By the sparse law of large numbers (see Lemma lA.ip and the fact that E < oo, 
we see that -\ ||^n|l2 is bounded in probability. The statement with i?„ replacing is exactly 
[271 Lemma 1.7]. □ 

The proof of Theorem 11.51 is completed by combining Theorem 12.11 and Lemma 12.21 with the 
following proposition: 

Proposition 2.3. Let < a < 1 be a constant and let x be a random variable with mean zero and 
variance one. Assume that the real and complex parts of x are independent; namely, that Re(x) is 
independent of Im(x) . Let Xn be the sparse matrix ensemble for x with parameter a, and let be 
the n by n matrix having iid copies of x for each entry (in particular, Yn is not .sparse). For each 
n, let Mn be a deterministic n by n matrix satisfying Inequality ([2]) and let An ■= Mn + Xn and let 
Bn '.= Mn + Yn- Then, for every fixed z £ C, we have that 



-log 

n 



det(^^n - zL) 

n 



-log 

n 



de{.{-^Bn - zL) 
n 



(4) 



converges in probability to zero. 



One useful property of the determinant is that it may be computed in a number of different 
ways. In particular, for a matrix M, we have 

n n n 

|det(M)| = J]|Ai(M)| = J];ai(M) = J];dist(i?„Span{i?i,...,i?,_i}), (5) 

i=l i=l i-l 

where Aj(M) and ai{M) are the eigenvalues and singular values of M, respectively, and where Ri 
denotes the i-th row of M. 

In the remainder of the current section, we will outline the program for proving Proposition 12.31 
and describe the differences between our proof and the proof of [271 Proposition 2.2]. As in [27], we 
will prove Proposition 12.31 by writing the determinant as a product of distances between the z-th 
row of a matrix and the span of the first i — 1 rows (thanks to Equation ([5])). Proposition 12.31 can 
then be proven via three main steps: 

1. A bound on the least singular value due to Tao and Vu [25] for sparse and non-sparse random 
matrices is used to take care of terms very high dimensional subspaces (i.e., span of more 
than n — •n}-^/^ rows). 

2. Talagrand's inequality is used, along with other ideas from [27] to take care of terms with 
high dimension (i.e., span of more than (1 — 5)n rows) not already dealt with by the previous 
step. Some care must be taken in the sparse case with the constant a in the exponent in 
order to use Talagrand's inequality, which is where the a/6 comes from in the previous step. 

3. A result of Chatterjee [3] along with new ideas in [27] are used to take care of the remaining 
terms. Here, the sparse case differs substantially from the non-sparse case, in that we must 
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use Chatterjee's result [3] in place of a result due to Dozier and Silverstein [1] used in [27] . 
This step in general follows Krishnapur [271 Appendix C], who investigates a universality 
principle for non-sparse random matrices with not necessarily iid entries, since there Dozier 
and Silverstein's result [4j cannot be applied. Our use of Chatterjee's result [3] requires the 
hypothesis that the real and imaginary parts are independent, and this is the only place where 
that hypothesis is used. 

Remark 2.4. It would be interesting to prove Theorem 11.51 without the hypothesis that Re(3;) and 
Im{x) are independent. The only place that this hypothesis is used is when applying a result 
of Chatterjee O Theorem 1.1] (see Theorem 15. 6p . which was originally proven for independent 
real random variables. Two possible ways to proceed in removing the the hypothesis that Re(x) 
and Im(x) are independent would be proving a complex version of [Sj Theorem 1.1] (though the 
result would require some re-phrasing, since the condition of differentiability is very different in the 
complex and real cases) or, alternatively, proving a sparse version of [4] (see Remark 12. 5p . 

Remark 2.5. It would be natural to investigate a version of Theorem 11.51 where convergence in the 
almost sure sense is proved rather than convergence in probability. Typically, proving almost sure 
convergence is harder than proving convergence in probability, however, the universality principle 
in |27] is proven for both types of convergence, and so may provide a general approach to proving 
a universality principle for sparse random matrices with almost sure convergence. One of the steps 
in proving the universality principle of [27] in the almost sure sense uses a result due to Dozier and 
Silverstein [Ij. In [1], a truncation argument is used that seems like it would need to be altered 
or replaced in order to prove a result for sparse random matrices. Another possible approach to 
proving a version of Theorem 11.51 for almost sure convergence would be to prove an analog of 
Chatterjee's [3l Theorem 1.1] (see Theorem 15. 6p for almost sure convergence, though this might 
require a very different type of argument than the one used in [3] . A version of Lemma lA.ll for 
almost sure convergence would also likely be necessary in any case. 

3 Proof of Proposition 12.31 

By shifting M„ by ziy/n (and noting that the new Mn still satisfies Inequality ([2])), it is sufficient 
to prove that 

-log 

n 

converges to zero in probability. 

Following the notation of [27], let Xi, . . . , X„ be the rows of An and let Yi, . . . ,Yn be the rows 
of Bn- Let Zi, . . . , Zn denote the rows of M„, and note that by Inequality ([2|) we have that 

n 

Y,\\Z,\\l = 0{r,'). 
i=i 

By re-ordering the rows of An, Bn, and M„ if necessary, we may assume that the rows Z^n/2] ) • • • > -^n 
have the smallest norms, and so 

||Zi||2 = 0{y/n), for n/2<i< n. (6) 
This fact will be used in part of the proof of Lemma 13.21 



det(— 



-log 
n 



det{—Bn) 
n 



12 



For 1 < i < n, let Vi be the (i — l)-dimensional space generated hy Xi, . . . , Xi^i and let Wi be 
the {i — l)-dimensional space generated by Yi, . . . , By standard formulas for the determinant 
(see Equation (H])), we have that 



-log 
n 



det(— 

'n 



1 



n 



log 



det(^S„) 

'n 



1 " 1 

— > logdistf— ;=Xj,l^) and 
n ^-^ Jn 

i=\ ' 

1 " 1 

- Viogdist(^yi,Wi). 

j=i 



It is thus sufficient to show that 



1 " 1 1 

- V logdist(^Xi, V,) - log dist(^yi, Wi, 



(7) 



converges in probability to zero. We will start by proving somewhat weak upper and lower bounds 
on dist(^Xj,Vi) and dist(-^li, VFj) that hold for all i. For the upper bound, note that by 

Chebyshev's inequality we have Pr(||Xj||2 > n^) < n~^, and thus by the Borel-Cantelli lemma, we 
have with probability 1 that ||^i||2 < for all but finitely many n and for all i. This implies that, 
with probability 1, 



dist(— Xi,T/i) < \\Xi\\2=n 



0(1) 



for all but finitely many n and for all i; and the same bound also holds for dist(^yi, Wi). To show 

a lower bound, define Sj"^ := Span({Xi, . . . ,Xi} \ {Xj}) and define An^ to be the i hy n matrix 
consisting of the first i rows of An- By [27^ Lemma A. 4], we have 

i i 



J;dist(x„5f)-2 = J:a,(A«)-^ 



(i) 

and since = 5^ , we thus have the crude bound 

distiXi,Vi)-^ < naiiA^:'')-^ 
By Cauchy Interlacing (see [271 Lemma A.l]), we know that ai{An^) > cr„(A„), and thus we have 



-<7n{An) < dist{^Xi,Vi) 

n \ n 



-fTn(S„) <dist(^y„t^,' 

n \ n 



and by the same reasoning, 



Lower bounds on dist(-^Xj, Vi) and dist(-^5^, Wi) will now follow from lower bounds on the least 
singular values of An and Bn which were proven in [25] . 
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Lemma 3.1 (Least singular value bound for sparse random matrices). [25] Let < a < 1 be a 

constant and let x be a random variable with mean zero and variance one. Let X„ be the sparse 
matrix ensemble for x with parameter a, and let Yn be the n by n matrix having iid copies of x for 
each entry (in particular, Yn is not sparse). For each n, let Mn be a deterministic n by n matrix 
satisfying Lnequality ([2|) and let An := M„ + X„ and let Bn := M„ + Yn- Then with probability 1 
we have 

an{An),an{Bn)>n~o^^^ 

for all but finitely many n. 



Proof. Paraphrasing j27|. proof of Lemma 4.1], the proof follows by combining |25l Theorem 2.5] (for 
the non-sparse matrix) and [251 Theorem 2.9] (for the sparse matrix) each with the Borel-Cantelli 
lemma, noting that the hypotheses of Theorem 2.5] and Theorem 2.9] are satisfied due to 
[25l Lemma 2.4] and Inequality ([2]). □ 



Thus, with probability 1 we have 



1 



logdist(— Xi,y,) 
n 



iogdist(— y„^i) 

n 



< O(logn) 



(8) 



for all but finitely many n. In light of Inequality ([8]), the following two lemmas suffice to prove that 
the quantity in Display ([7]) converges in probability to zero. 

Recall that a is the parameter used to determine the sparseness of the sparse matrix ensemble. 

Lemma 3.2 (High-dimensional contribution). For every e > 0, there exists a constant < 5^ < 1/2 
such that for every < 5 < (5^ we have with probability 1 that 



(l-(5)n<j<n-ni-«/6 



n 



0{e) 



for all but finitely many n. 

Note that Lemma 13.21 with Yi (which is not sparse) replacing Xi and with Wi replacing Vi was 
proven in [271 Lemma 4.2] with 0.99 replacing 1 — a/6. Alternatively, the non-sparse case follows 
from our proof of Lemma 13.21 if one sets a = 1 (giving an exponent of 5/6 in place of the exponent 
0.99 used in [271 Lemma 4.2]). Also, note that for all sufficiently large n, we may assume that 
Equation ([6]) holds for all i relevant to Lemma 13.21 above. 



Lemma 3.3 (Low-dimensional contribution). For every e > 0, there exists < 6 < e such that 
with probability at least 1 — 0(e) we have 



E 

l<i<{l-S)n 



log ( dist 



1 



n 



Xi,Vi 



log ( dist 



1 



n 



Yi,Wi 



0{e) 



for all but finitely many n. 



To complete the proof of Proposition 12.31 one may combine Lemma [3.21 \n\ Lemma 4.2] (which 
is the non-sparse analog of Lemma 13. 2p . and Lemma 13.31 In particular, one may simply set e 
in Lemma 13.31 equal to min{(5i,52}) where 5i is the upper bound on 5 from Lemma 13.21 and 82 
is the corresponding upper bound on 5 from [271 Lemma 4.2] (or from the non-sparse version of 
Lemma 13.21). 
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4 Proof of Lemma 13.21 



Following [27], we will prove Lemma 13.21 in two parts, splitting the summands into cases where the 
log is positive and where the log is negative. The proof below follows the proof of \27\ Lemma 4.2] 
closely, and we have included it in detail to make explicit the role of q, which determines the sparse- 
ness of the matrix A^.- One place where particular care must be taken with sparseness parameter 
a is in a truncation argument needed to apply Talagrand's inequality (see Subsection 14. 3p . There, 
we have made frequent use of the assumption that a is a positive constant, though it is possible 
that a very slowly decreasing a could also work — see Lemma 11.91 and Remark 11.101 

4.1 Positive log component 

By the Borel-Cantelli lemma, the desired bound on the positive log component may be proven by 
showing 

°° f 1 1 \ 

VPr - V max{logdist(^Xi,1/i),0}>e <oo. (9) 

— ' \ T7. ^ — ' ^ /n. I 



n=\ \ (l-5)n<j<n-ni-°=/6 



We will use the crude bound maxjlog dist(;^Xj, Vj), 0} < max{log( ^^^^ ), 0}. Note that if 
2mo < < 2™o+\ then mo < log2(Ji^) < mo + 1, and so 



IX- 

1{||X,||2>2'-V^} = "iO + 1 > iog2(- 

m=0 

Thus, 



E 1{||X,||2>2'-V^} = "^0 + 1 > log2(^^). 



1 11 

- max{logdist(--=X,V^O,0} < ^ - l{||x,||2>2™V^}- 

(l-(5)n<i<ra-r!.i-«/6 ^ m=0 (l-5)n<i<n-ni-"/6 



(10) 



If the left-hand side of Inequality (|10p is at least e for a given n, then we must have for some 
m > that 

1 2e 

- 2^ ^{U.\\^>2^M - (100 + m)2- ^^^^ 

(l-(5)n<i<n-ni-«/6 

We now have two cases to consider. For the first case, assume that the smallest m satisfying 
Inequality ()lip satisfies m > n^^^. Then for Inequality (|lip to be satisfied, there exists some 
1 < i < n such that IIXII2 ^ 2"'^^^ ^/n. By Chebyshev's inequality and Equation [6l we have 
that Pr(||Xj||2 > 2"^''^T/n) < 0( ^^ ), and thus the probability of such an i existing is at most 

/(n) := 1 — (1 — c2~2"'^'^^)"', where c is some constant. It is not hard to show that f{n)n^ — )■ as 
n — )• oo, and thus, for all sufficiently large n, we have the probability that there exists an i such 
that II X II 2 — 2"^''^-^/n is at most e/n^. Since this probability is summable in n, we have proved 
Inequality ([9]) in the first case. 

For the second case, assume that the smallest m satisfying Inequality pT]) satisfies < m < n^/^. 
In this case we will use Hoeffding's Inequality. 
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Theorem 4.1 (Hoeffding's Inequality [11|). Let /3i,...,/3fc be independent random variables such 
that for 1 < i < k we have 

Pr(ft-E(ft)G [0,1]) = 1. 

Let S := J2i=i l^i- Then 

Pr(5 >kt + E(5)) < exp {-2kt^) . 

The random variables j3i will be Ijyx ||2>2'"v^}' ^^"^ ^^ViS we need to control Pr(||Xj||2 > 2^y/n) 
in order to bound E(S'). By Equation ^ and Chebyshev's inequality, we have that 



Pr(||X,||2>2™V^)<0(^). (12) 



We will take k = n — n^ ~ (1 ~ '^)^i so we have that lim„_j.oo ^ = ^- Also, 6^ sufficiently small so 
that 5e < 20000C 1 where C is the implicit constant in Inequality (fT2|) . If we take t = ^ ^ (loo+m)^ ) ' 
we can compute that 



n n ' '~(100 + m)2 " (100 + m) 



2 



for all sufficiently large n (the second inequality follows by taking n sufficiently large so that 
k/n <25 < 26^)- Thus, by Hoeffding's Inequality and taking n sufficiently large, we have 

V {l-5)n<i<n-ni-"/6 ^ 'I ^ ^ ' ^ 



\ ( -ne2 \ /-ni/5e2^ 



where the last inequality follows from our assumption in this second case that < m < n}!^ . Thus, 
we have shown for all sufficiently large n and any < (5 < 5^ that 

Pr i ^.), 0} > < max exp (^^) , exp ( j . 

Finally, we note that the bounds from the two cases sum to at most e/n2+max |exp ^ g(2oo)4 ^ ' ^^P (~TM^) } ' 
which is summable in n, thus completing the proof for the positive log component. 

4.2 Negative log component 

By the Borel-Cantelli Lemma, it suffices to show that 

VPrf- y max{-logdist(^Xi,yi),0} > e I < oo. (13) 

^ I n ^-^ , Jn I 

"=1 \ (l-5)n<i<n-ni-"/6 ^ / 

Following the approach in |27j . our main tool is the following lemma. 
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Proposition 4.2. Let < a < 1 be a constant, let 1 < d < n — n^ let < c < 1 be a constant, 
and let W be a deterministic d- dimensional subspace ofC^. Let X be a row of An- Then 



Pr(dist(X,VF) < cVn - d) = 6exp(- 



-n 



for all n sufficiently large with respect to c and a. 

We will give the proof of Proposition 14.21 in Subsection 14.31 The proof of the negative log 
component of Lemma [3.2l can be completed by using Proposition 14.21 and following the proof of j27| 
Lemma 4.2], which we paraphrase below. 

Taking c = 1/2 in Proposition 14.21 and conditioning on Vi, we have that for each (1 — 5)n < i < 
n — n"/^ that 

Fr (clis. (-Lx., F.) > ^^i^) > 1 - 0(exp(,.-"/=)). 
Thus, the probability that 



simultaneously for all (1 — 5)n < i < n — n"/^ is at least 1 — 0{n~^^) (in fact, better bounds are 
possible, but this is sufficient). 

Finally, choosing (5^ sufficiently small so that % log < e, we can take the log of Inequality (|14p 



2 "^e, 

10 



and sum in i to get that the probability in the summand of Inequality (I13p in at most 0(n 
and this is summable in n, completing the proof of Inequality (jlSp . 



4.3 Proof of Proposition 14.21 



Recall that X has coordinates Oi = + rrii, where rrij is a fixed element (it comes from the 
matrix M„), Xi is a fixed, mean zero, variance 1 random variable (it does not change with n), and 
p = where < a < 1 is a constant. The proof of Proposition 14.21 closely follows the proof 

of [271 Proposition 5.1], and we give the details below to highlight how the proof must be modified 
to accommodate sparseness with parameter a. In particular, care must be taken with the value of 
a in the following three steps: first, when reducing to the case where the sparse random variables 
are bounded (since sparseness requires scaling by l/n"^"*""), second, when showing that the sparse 
random variables restricted to the bounded case still have variance tending to 1 as n — ?• oo, and 
third, when applying Talagrand's inequality where one must keep track of a in the exponent on 
the upper bound. 

Proof. First we reduce to the case where X has mean 0. Let v = E(X). (Note that v is the row of 
Mn corresponding to X). 

Note that dist{X,W) > dist(X — v,Span{W,v)). Thus, by changing constants slightly (while 
still preserving < c < 1) and replacing dhy d+1, it suffices to prove Proposition 14.21 in the mean 
zero case. 

The second step is reducing to a case where the coordinates of X are bounded. In particular, we 
will show that, with probability at least 1 — 2exp(— n"/^), all but n'^'^ of the coordinates of X take 
values that are less than n^/^""^/^. Let ti := l||^,|>„(i-(i/2)/2 j , and let T := X^^^^^i. If E(T) = 0, 

then with probability 1 we have that \ai\ < and we are done with the reduction to the 
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case where the coordinates are bounded. Thus, it is left to show this reduction in the case where 
E(r) > 0. 

By Chernoff (see [241 Corollary 1.9]) we know that for every e > we have 



£2 e 



Pr(|r-E(r)| > eE(r)) < 2exp j^-min | - j E(r) 

Since E(T) > by assumption, we may set e := — 1. By Chebyshev's inequality, we have 
Pr(|ai| > n(i-"/2)/2^ < all 1 < i < n, and thus E(r) = nE(ti) < n"/^^ which implies 

that e > fi0-8-°/2 — 1 > 2 for large n. Here we used the fact that < a/2 < .5. Using the Chernoff 
bound we have 

Pr(T > (1 + e)E(T) = < 2exp (^-|E(r)) 

< 2exp(-n°-V2 + E(r)/2) 

< 2exp (^-n°-V2 + n"/V2) 

< 2exp(-n°-^/4) 

< 2exp(-n"/2). 

Thus, with probability at least 1 — 2exp(— n"), there are at most nP'^ indices for which |aj| > 
^(i-a/2)/2^ For a subset / C {1, 2, . . . , n}, let Ej denote the event that I = {i : \ai\ > ni/2-a/4. ^ < 
i < n}. 

By the law of total probability, we have 

Pr(dist(X, W) < cVn - d) < 2 exp(-n°/2) + ^ Pr (^dist(X, W < cVn-d Ei^ Pr(^7). 

/C{l,...,n} 
|/|<n0-8 

Thus, it is sufficient to show that 

Pr {A\st{X,W) < cVn-d Ej^ < 4exp(-n'^/2^ 

for each / C {1, . . . , n} such that |/| < n^'^. 

Fix such a set /. By renaming coordinates, we may assume that I = {n' + 1, . . . ,n} where 
n — nP'^ < n' < n. The next step is projecting away the coordinates in /. In particular, let 
vr : C" — )• C" be the orthogonal projection onto the first n' coordinates, and note that 

dist(X,l^) > dist(7r(X),7r(T^)). 

Thus, we can condition on a^'+i, . . . , a^, adjust c slightly (without changing the fact that < c < 1), 
and (abusing notation to henceforth let n stand for n') see that it is sufficient to show 

Pr(dist(X,T^) < cVn - d\ \ai\ < n^/^-"/^ for every 1 < i < n) < 4exp(-n"/2). 

Lemma 4.3. Let di be the random variable ai conditioned on \ai\ < n}l'^~'^l'^ . Then di has variance 
l + o(l). 
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Proof. By definition 

Var(ai)=E(|ai|2)-|E(Si)|2 



E(|aj| I |oj| < n 
1 



l/2-a/4^ 



E(aJ I a,- 1 < n 



l/2-a/4^ 



"Pr(|a,|<nV2-a/4)^(l«^l ^{KKn^Z-'^/n^ " Pr(|a,| < nV2-«/4)2 E(aa{|,,|<,i/W4}) 

Note that Oj = and so |aj| < n^/^~"/^ if and only if \IpXi\ < n"/^. Since Xi does not change 

with n, we see that Pr(|ai| < = Pr(|]^Xj| < n"/^) — )• 1 as n — )• oo. Also, by Lemma 11.91 

we know that E(|aj| l{|(j.|<„i/2-c</4|) E(|ai| ) = 1 and that E(ail||^,|^^i/2-<,/4|) ]E(aj) = 0. 
Thus, we have shown that di has variance 1 + o(l). □ 

Next, we recenter di by subtracting away its mean, and we call the result di. Note that this 
recentering does not change the variance. We will use the following version of Talagrand's inequality, 
quoted from [271 Theorem 5.2] (see also [HJ Corollary 4.10]): 

Theorem 4.4 (Talagrand's inequality). Let D be the unit disk {z G C, \z\ < 1}. For every product 
probability /x on D", every convex 1-Lipschitz function F : C" — t- M, and every r > 0, 

nilF - M(F)| > r) < 4exp(-rV8), 

where M{F) denotes the median of F. 

Let X = (ai, 02, . . . , On), and let /i be the distribution on D" given by X/2nV2-°/4. Let 
4,VF) , and note that F is convex and 1-Lipschitz, which follows since 



F 



dist 



X 



2 ^^"^^ \^2ni/2-Q/4 

dist(X/2ni/2-°/4, W) is both convex and 1-Lipscliitz (and also using the fact that dist(X/2ni/2-°/4, W) < 
1, since G W). 

By Theorem 14.41 with r = 3n°/'^, we have 

Pr dist(X, Wf - M(dist(X, W)^) > 12n"/^ni-"/2^ < 4exp(-n"/2), 

which implies that 

Pr(dist(X, 14^)2 < M (dist (X, 1^)2) - 12^^^"/^) < 4exp(-n°/2^ 



(15) 



Recall that F = ^ dist (^^^jry^r^, VFj . Using Talagrand's inequality (Theorem I4.4p again, we 
will show that the mean of F is very close to the median of F. We compute 



POO 

\K{F) - M{F)\ <K\F - M{F)\ = / Pr{\F - M{F)\ > t) 

Jo 

/•oo 

< / 4:exp{-t'^/8)dt = 8V2^. 
Jo 

Thus, we have shown that 

E(dist(X, wf) - M(dist(X, wf) < (32V2^)ni~°/2_ 



dt 



(16) 
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Lemma 4.5. E(dist(X, Wf) = (1 + o(l))(n - d). 

Proof. Let vr := (vTjj) denote the orthogonal projection matrix to W. Note that dist(X, VF)^ = 
Sr=i Sj=i diTTijdj. Since are iid, mean zero random variables, we have 

n 

E{dist{X,Wf) =E{\d,\^)^7rii = E{\d,\^)tv{7r). 

i=l 

The proof is completed by applying Lemma 14.31 and noting that the trace of vr is n — d. □ 

From Inequality (|15p . we see that it is sufficient to show that 

M(dist(l, W)^) - 12n^-"/^ >c^{n- d). 
Using Inequality (jl6|) and Lemma 14.51 we have for sufficiently large n that 

M{dist{X,Wf) - 12?!^-"/^ > E{dist{X,Wf) - (32V2^)ni-°/2 _ l2?ii-°/4 



> (^c" + {n -d)- (32V2^)ni-/2 - Un^-^'^ 

>c^{n-d)+ (^^) ri^""'^ - (32^/2^)nl-"/2 _ 12^1-"/^ 



> c\n-d), 

where the last inequality follows from the fact that (^^-^) _ {^'^2^/2^:)n^~"/'^ — 12n-^~"/^ is 

a positive quantity for sufficiently large n. Combining the above computation with Inequality [15] 
completes the proof of Proposition 14.21 □ 

5 Proof of Lemma 13.31 

Lemma 13.31 follows directly from the slightly more detailed statement in Lemma 15.11 given below. 
In this section, we will prove Lemma I5.H closely follows the proof of [271 Lemma 4.3] with some 
changes. The biggest difference with the proof of |27[ Lemma 4.3] is in the proof of Lemma 15. 3[ 
where we must adapt the approach of Krishnapur from [27l Appendix C] to a sparse setting (see 
Lemma l5.5p . This is one critical juncture where it seems like it would take a new idea to prove 
almost sure convergence in place of convergence in probability. One possible approach would be 
proving a sparse version of [1] (which is used in [27] in the proof of almost sure convergence in the 
non-sparse case). Lemma [5.51 is also the only place where the assumption of independence between 
the real and complex parts is used, and it would be interesting to see if this assumption could 
be removed. Other notable differences from the proof of [27[ Lemma 4.3] are that we must use 
Proposition 14.21 in place of [27l Proposition 5.1] and that we kill keep track of a lower bound on 5, 
which simplifies some steps in the proof. 

Lemma 5.1. For every ei > and for all sufficiently small €2 > 0, where €2 depends on ei and 
other constants, the following holds. For every 6 > satisfying 

4<s< 



401og(l/e2) 
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we have with probability 1 — 0(ei) 



- V log dist (^Xi, - log dist (^Yi, Wi 

\<%<{\-S)n ^ 



0(62) 



:{l-5)r 

for all but finitely many n. 

As shown in |271 Section 6], it is sufficient to prove that with probability 1 — 0(ei) we have 



^y'log ( ^cri(A„ „/) ) - log ( ^o-i(5„.„/) 
n' V V / V V 



0(^2) 



(17) 



for all but finitely many n, where n' = [(1 — (5)nJ , where (Ji{A) denotes the i-th largest singular 
value of a matrix A, and where A„ denotes the matrix consisting of the first n' rows of An and 
Bn^n' denotes the matrix consisting of the first n' rows of Bn- 
Proving Equation p!7|) is equivalent to showing 



\ogtdVn,n'{t) 



0(62), 



(18) 



where du^ n' is defined by the difference of the two relevant ESDs, namely: 



nrt' 71,77. nr^^ ft' 



Following [27], we can prove Equation (jlSp by dividing the range of t into a few parts, which 
follows from Lemmas 15.21 (for large t). 15.31 (for intermediate-sized t), and 15.41 (for small t). 

Lemma 5.2 (Region of large t). For every ei > 0, there exist constants 62 > and R^^ such that 
with probability 1 — 0(ei) we have 



I log 1 1 \dl^n,n'{t)\ < £2- 



Proof. By Lemma [2. 2 1 and [271 Lemma A. 2], we have that Jq°° t |(if„^n'(0| is bounded in probability. 
Thus, there exists a constant C^^ depending on ei such that with probability 1 — 0(ei) we have 



/•oo 

/ t \diyn,n'{t)\ < Ce^. 

Jo 



Choose €2 > sufficiently small with respect to ei and C^i so that 



1 > 2C,,e2log 



Set R, 



lea ~ \^) ' assume without loss of generality that Re2 > e. Note that is increasing 
for t > R^2 > ^1 thus by the definition of €2 we have 

^log(t)<t 

£2 
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whenever t > R^^. Thus, we have with probabihty 1 — 0(ei) that 



/ |logt| |dl^„,„'(*)| < / y^t\dl^n,n'{t)\ < e2. 
JR^^ Jo ^ei 



□ 



Lemma 5.3 (Region of intermediate t, namely e\ < t < Re2)- Define a smooth function ip(t) 
which equals 1 on the interval [e|,i2e2]i supported on the interval [e|/2, 2i?e2]? ^-^ monotonically 
increasing on (e|/2,e|), and is monotonically decreasing on {R^^,2R^2). 



Then, with probability 1 — 0(ei) we have 

Tp{t)log{t) dVn,n'{t) 



0{e2) 



so long «S'^< ioiOT^- 

The main step in this proof is applying Lemma 15.51 whereas in the analogous step in the non- 
sparse case, [27] uses a result of Dozier and Silverstein [3], which proves almost sure convergence of 
the relevant distributions (rather than convergence in probability, which is the limit of Lemma [53]). 
It would be interesting to see if a sparse analog of [1] is possible, especially since this might provide 
a way to remove the hypothesis requiring that the real and complex parts be independent, and it 
might further be one of the necessary components to proving a universality result for sparse random 
matrices with almost sure convergence instead of convergence in probability. 

Proof. Using \27\ Lemma A.l] and the upper bound on 5, it is possible to show that 



V'(i)log(t) dVn,n'{t) 



ij{t)\og{t) dVn,n{t) 



+ 0(62) 



(A possible alternative to the step above would be proving an analog of Lemma 15.51 for rectangular 
n by n' matrices.) 

By Lemma 15.51 (see Subsection 15. ip . we know that dun,n converges in probability to zero, and 
thus 



i){t)\og{t) dyn,n{t) 



0{e2), 



completing the proof. 



□ 



The last step in proving Equation (jlSp and thus completing the proof of Lemma 15.11 is the 
following lemma: 

Lemma 5.4 (Region of small t, namely < t < < 5^). With probability 1, we have 



|logt| \dUn,n'{t)\ = 0(e2) 



SO long as S < h 



log{l/e2) 



1/4 



Proof. The required upper bound on 6 follows from the assumption that 6 < 40 \og{i/e2) ' '^^^ proof is 
the same as the proof for [271 Lemma 6.6], with the small change that one must use Proposition 14.2 
in place of [27] Proposition 5.1]. □ 
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5.1 Applying a theorem of Chatterjee 

In this subsection, we follow the ideas used by Krishnapur in [27\ Appendix C], where a central- 
limit-type theorem due to Chatterjee [3] was used to prove a universality result for random matrices 
with independent but not necessarily identically distributed entries. Lemma 15.51 below is analog 
of [27\ Lemma C.3]. Recall that Ip is an iid copy of the random variable taking the value 1 with 
probability p and the value with probability 1 — p, where p = n"^"*"" where < a < 1 is a positive 
constant. 

Lemma 5.5. Let X = {X^l , x[^l , X^l , X^l , . . .) be an array of InP' independent real random 
variables, each of which is an iid copy of X\pl ^fp, where X is mean zero, variance 1, and let Y = 

{Y^^^ ,Y^^^ ,Y2^^ ,Y2^^ , . . .) be another array o/2n^ independent real random variables, each of which 
is a iid copy of a mean zero, variance 1 random variable Y . Let A„(X) denote the n by n random 
matrix having x\^^ + \/—lX^^^ for the {i,j) entry, and similarly for An{Y). Let /^1a„(x)A„(x)* 
and /Ui^^(Y)A„(Y)* denote the ESDs of ^An{'^)An{^)* and ^A„(Y)j4„(Y)*, respectively. Then, 
l^^An{X)An(X)* ~ l^^A„(Y)An{Y)* converges in probability to zero as n ^ oo. 

Proof. Our approach will be applying [3l Theorem 1.1] in a similar way to \n\ Lemma C.3]. 

Note that the eigenvalues of Hn(X.) with multiplicity are exactly the positive and negative 
square roots of the eigenvalues with multiplicity of ^A„(X)A„(X)*. Also, the same fact applies to 
Hn(y) and ^An(Y)An(Y)* ■ We will now follow the computation given in [3l Section 2.4]. It is 
sufficient to show that Ph„(x) ~ /^_ff„(Y) converges in probability to zero as n — )• oo. 

Let It, u G M with u / and let z = u + \/—lv. Define a function / : M^"^ — )• C by 

/(x) = ^tr((/?„(x)-z/)-i). 

Here x = (a^j- j^)i<ij<n;fce{o,i}) where xf'j corresponds to the real part (namely, X^^j or ^/^'') and 
x^j^j corresponds to the complex part (namely, X^^j or Y^^^). 
Define G : M^"" ^ C^^")' by 

G(x) = {Hn{^)-zI)-\ 

All eigenvalues of Hn{x.) are real, and thus all eigenvalues of H{x) — zl are non-zero (since 
V ^ 0). Thus, G(x) is well-defined. Prom the matrix inversion formula, each entry of G(x) is a 
rational expression in xf'j for 1 < i, j < n and k G {0, 1}. Thus G is infinitely differentiable in each 
coordinate x^^} . 

In the remainder of this section, we will use the shorthand G for G(x) and the shorthand H for 
i/„(x). 

Note that 
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(this can be seen by using the product rule and differentiating both sides of the equation (ff„(x) — 
zI)G = I). The foUowing three formulas follow from Equation (I19p and the fact that iT[AB) = 
ti[BA) for any two square matrices A and B: 



df 



dx 



(fc) 





W^3 



■tr 



(fc) 
dH 



dx- 



{k) 



and 



n 



dx 



(k) 



dx 



(k) 



As in O Section 2.4], we will use the following facts to bound the partial derivatives of /. Note 
that tr{AB) = ||74||2||-B||2- Also, for A a k hy k normal matrix with eigenvalues Ai, A2, . . . ,Xk and 
B any square matrix, we have max{||yli?||2, ||i?^||2} < (maxi<j<fc Aj) ||-B||2- By the definition of 
G, it is clear that the absolute value of the largest eigenvalue of G is at most |f |~^- Also, by the 
definition of H, it is clear that is the matrix having (\/— l)*^?!""*"/^ for the (n + entry. 



having {—\/—l)^n for the (j, n + i) entry, and having zero for all other entries. 
Thus, for all 1 < i, j < n and k G {0, 1}, we have that 



G' 



df 


< 


1 


dH 


dx^''^ 




2n 


dx'^''^ 




















< 












1 n 



By similar means, we can compute 



{dxf]Y 



and 



{dx^f 



< 



1 

< — 

- 2n 



n 



dH 



dx. 



(k) 



< 



2 V 



< 



< 



3 


dH 


n 




&\/2\v\~ 


^5/2 



dx 



< - J- \v\ ^ J- 
n \ n \ n 

-3 



^dH ^dH 
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We will now apply the main theorem from [3]. First, we need the following definitions for a 
function h:R^ ^C. Let 



A2(/i) := sup < 
X-slh) := sup 



dh 


2 


d^h 








dh 


3 


d^h 









3/2 



and let 



d^h 



(dx)^ 



W^3 



Theorem 5.6. [3] Let X = {Xi, . . . , X^v) and Y = (Yi, . . . , Y/v) lists of independent, real-valued 
random variables such that E{Xi) = E{Yi) and E{Xf) = E{Y^^) for alll<i< N. Leth:R^^R 
be thrice differentiable in each argument. If we set U = h(X.) and V = h(Y), then for any thrice 
differentiable 5 : M — )• R and any K > 0, 



N 



\Eg{U) - Eg{V)\ < Ci{g)\2{h) ^ {E{Xl, \X,\ > K) + E{Y^- \Y,\ > K)) 

i=l 

N 

+ C2{g)Xm J2 \Xi\ <K) + E{\Yi\^; \Y,\ < K)) 



i=l 



where Ci{g) = \\g'\\oo + \\g"\\oo and C2{g) = \\\g'\\oo + ^ll/lloo + \\\9"'\ 



Thus, for our function / we have 

A2(/) = sup 
Asl/) = sup' 



I 1-4 n| 1-3 

\v\ 2 \v\ 



1 ' 2 



and 



-6 V8|t;-9/2| 6^u|-4' 



n 



5/2 



Theorem 15.61 requires /i to be a real- valued function, thus we will apply Theorem 15.61 to Re(/) 
and Im(/) separately. Given g : M — )• M a thrice differentiable function, set \J = Re(/(X)) and 
V = Re(/(Y)), where X and Y are as in the statement of Lemma 15.51 (notationally, set = 2n^ 

and define X^ by Xij^2n(i~i)+2(j~i)+k ■= ^ij)- Noting that Xr{Ref) < Xrif), we may apply 
Theorem 15.61 to get 



\Eg{U)-EgiV)\< C,{g)X2{f) EE^((4. 

fcG{0,l} i=l j = l 



X, 



(k) 



>K)+E{iYlfY; 



+C2{9)xs{f) EE^(|4 

A;e{0,l} i=l j=l 



X. 



(k) 



<K) + E{ 



Y 



(fc) 



Y 



(k) 



> K) 

(20) 
< K). 

(21) 



Choose K = e-y/n, where e > is a small positive constant. The triple-sum term in Display (j2ip is 
bounded by e times a constant depending only on g and v (here, we used that E(|X|'^ ; |X| < K) < 



25 



K¥.{X'^) for any real random variable X). Also, the triple-sum term in Display (j20p is bounded by 
another constant depending only on g and v times the quantity 

^ n n 



k&{0,l} i=l j=l 

Since the random variables Y-^j^ do not change with n, it is clear from monotone convergence that 



X. 



(fc) 



> 



.J J > €^/n) — )• as n — )• oo. Thus, it is sufficient to show that 1E,{{X> j ^ , .^^^ 

e^/n) — )• as n — )■ oo. Recall that X^'^^ is an iid copy of XIp/^, where X is mean zero, variance 
1. We have that 



E 



XI, 



VP 



Xh 



VP 



> eV^ < E 



XI, 



VP 



; \X\ > = E ; \X\ > 



where the last equality follows by the independence of Ip and X. Finally, by monotone convergence 
agam, we see that E (\X\^ ; \X\ > ey/pnj as n — 7- oo, completing the proof. □ 
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A Sparse law of large numbers 

Below we give a weak law of large numbers for sparse random variables of the sort considered in 
this paper. One of the ingredients to proving a universality result for the ESDs of sparse random 
matrices with almost sure convergence (rather than convergence in probability) would likely be a 
strong version of the sparse law of large numbers. 

Lemma A.l (Sparse Law of Large Numbers). Let ^ he a complex random variable such that 
E |,^| < oo. Let X he a sparse version of ^ with parameter a, namely X := Ip^/p, where p = 
where < a < 1 is a constant. Let m he a function of n such that m = m{n) > n. Then, if Xi is 
an iid copy of X for all 1 < i < m, we have for every e > that 



lim Pr( 



^ m 

-Y^x,-m) 



> e 



L.e., ^YliLi -^i converges to (^) in prohahility. 

The proof below follows the general description given in 

Proof. We want to show for any small constant e > that ^ YlT^i -^i ~ ■'^(C)+0(e) with probability 
at least 1 — 0(e). 
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By Lemma ll.9| we know that E( 



l{|X|>ni-°/2}^ 



as n — 7- oo; and thus, we can choose n 



) < eV4 and |E(1||^|<^,_/,|X) - E(C) < e/4 



sufficiently large so that l||^|^^i_c/2|. 
Note that 

^ m ^ m ^ m 



m 

j=i 

We thus have that 



i=l 



i=l 



Pr( 



^ m 



> e) < Pr( 



^ m 
i=l 



Applying Chebyshev's inequality to the first term and using the fact that 
e/4, we have that 



^ m 

>e/2)+Pr(-5^1{|x,|>„W2}^. 

i=l 

(22) 

E(l||^l<„i_„/,|X)-E(0 



< 



Pr( 



i=l 



> e/2) < 



E\X\ 



Applying Markov's inequality to the second term, we have 



Pr( 



^ m 
m ^ — ^ 1 



i=l 



> e/2) < e/2. 



Plugging these two estimates into Inequality ([22]) and using the facts that E(|X|) = E(|^|) and 
m > n, we have that 



Pr( 



^ 'in 



i=l 



, 16E(|^|) e 
^- ^2' 



which is less than e for n chosen sufficiently large with respect to e, a, and E(|,^|). 



□ 
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